Using machine learning methods to avoid the pitfall of cognates and false friends in Spanish-Portuguese word pairs
نویسندگان
چکیده
The fact that 85% of the Portuguese lexicon contains Spanish cognates and that the linguistic structures of both languages are highly coincident is believed to be an advantage for the Spanish speaker who learns Portuguese. However, these similarities have some negative aspects in the learning of Portuguese, such as, the pitfall of false friends, since about 20% of cognates are false. The aim of this article is to identify cognates and false friends between Spanish and Portuguese automatically to build dictionaries of these words. One of the uses for these dictionaries is to support scientific writing tools, which can help lower barriers for Spanish speakers when they write in Portuguese.
منابع مشابه
Disambiguation of partial cognates
Cognates – words that have similar spelling and meaning in two or more languages – can accelerate vocabulary acquisition and facilitate the reading comprehension task. A student has to pay attention to the pairs of words that look and sound similar but have different meanings – false-friend pairs, and especially to pairs of words that share meanings in some but not all contexts – partial cognat...
متن کاملUnsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web as a Corpus
False friends are pairs of words in two languages that are perceived as similar, but have different meanings, e.g., Gift in German means poison in English. In this paper, we present several unsupervised algorithms for acquiring such pairs from a sentence-aligned bi-text. First, we try different ways of exploiting simple statistics about monolingual word occurrences and cross-lingual word co-occ...
متن کاملCognate or False Friend? Ask the Web!
We propose a novel unsupervised semantic method for distinguishing cognates from false friends. The basic intuition is that if two words are cognates, then most of the words in their respective local contexts should be translations of each other. The idea is formalised using the Web as a corpus, a glossary of known word translations used as cross-linguistic “bridges”, and the vector space model...
متن کاملMultilingual lexical resources to detect cognates in non-aligned texts
The identification of cognates between two distinct languages has recently started to attract the attention of NLP research, but there has been little research into using semantic evidence to detect cognates. The approach presented in this paper aims to detect English-French cognates within monolingual texts (texts that are not accompanied by aligned translated equivalents), by integrating word...
متن کاملExploring cognition processes in second language acquisition: the case of cognates and false-friends in EST
This article explores one aspect of the processing perspective in L2 learning in an EST context: the processing of new content words, in English, of the type cognates and false friends, by Spanish speaking engineering students. The paper does not try to offer a comprehensive overview of language acquisition mechanisms, but rather it is intended to review more narrowly how our conceptual sys...
متن کامل